Goto

Collaborating Authors

 input and output


1343edb2739a61a6e20bd8764e814b50-Supplemental-Conference.pdf

Neural Information Processing Systems

The appendix is organized as follows: In Sec. A1, we provide a generalization of Claim 1 and its the complete proof. A2, we provide the complete proof for Eq. A3, we provide additional ablations and experimental results. A4, we provide additional implementation details.


GeMA: Learning Latent Manifold Frontiers for Benchmarking Complex Systems

arXiv.org Machine Learning

Benchmarking the performance of complex systems such as rail networks, renewable generation assets and national economies is central to transport planning, regulation and macroeconomic analysis. Classical frontier methods, notably Data Envelopment Analysis (DEA) and Stochastic Frontier Analysis (SFA), estimate an efficient frontier in the observed input-output space and define efficiency as distance to this frontier, but rely on restrictive assumptions on the production set and only indirectly address heterogeneity and scale effects. We propose Geometric Manifold Analysis (GeMA), a latent manifold frontier framework implemented via a productivity-manifold variational autoencoder (ProMan-VAE). Instead of specifying a frontier function in the observed space, GeMA represents the production set as the boundary of a low-dimensional manifold embedded in the joint input-output space. A split-head encoder learns latent variables that capture technological structure and operational inefficiency. Efficiency is evaluated with respect to the learned manifold, endogenous peer groups arise as clusters in latent technology space, a quotient construction supports scale-invariant benchmarking, and a local certification radius, derived from the decoder Jacobian and a Lipschitz bound, quantifies the geometric robustness of efficiency scores. We validate GeMA on synthetic data with non-convex frontiers, heterogeneous technologies and scale bias, and on four real-world case studies: global urban rail systems (COMET), British rail operators (ORR), national economies (Penn World Table) and a high-frequency wind-farm dataset. Across these domains GeMA behaves comparably to established methods when classical assumptions hold, and provides additional insight in settings with pronounced heterogeneity, non-convexity or size-related bias.


\ell_1 -regression with Heavy-tailed Distributions

Neural Information Processing Systems

In this paper, we consider the problem of linear regression with heavy-tailed distributions. Different from previous studies that use the squared loss to measure the performance, we choose the absolute loss, which is capable of estimating the conditional median. To address the challenge that both the input and output could be heavy-tailed, we propose a truncated minimization problem, and demonstrate that it enjoys an $O(\sqrt{d/n})$ excess risk, where $d$ is the dimensionality and $n$ is the number of samples. Compared with traditional work on $\ell_1$-regression, the main advantage of our result is that we achieve a high-probability risk bound without exponential moment conditions on the input and output. Furthermore, if the input is bounded, we show that the classical empirical risk minimization is competent for $\ell_1$-regression even when the output is heavy-tailed.





The Evolution of Learning Algorithms for Artificial Neural Networks

arXiv.org Artificial Intelligence

In this paper we investigate a neural network model in which weights between computational nodes are modified according to a local learning rule. To determine whether local learning rules are sufficient for learning, we encode the network architectures and learning dynamics genetically and then apply selection pressure to evolve networks capable of learning the four boolean functions of one variable. The successful networks are analysed and we show how learning behaviour emerges as a distributed property of the entire network. Finally the utility of genetic algorithms as a tool of discovery is discussed.


\ell_1 -regression with Heavy-tailed Distributions

Neural Information Processing Systems

In this paper, we consider the problem of linear regression with heavy-tailed distributions. Different from previous studies that use the squared loss to measure the performance, we choose the absolute loss, which is capable of estimating the conditional median. To address the challenge that both the input and output could be heavy-tailed, we propose a truncated minimization problem, and demonstrate that it enjoys an $O(\sqrt{d/n})$ excess risk, where $d$ is the dimensionality and $n$ is the number of samples. Compared with traditional work on $\ell_1$-regression, the main advantage of our result is that we achieve a high-probability risk bound without exponential moment conditions on the input and output. Furthermore, if the input is bounded, we show that the classical empirical risk minimization is competent for $\ell_1$-regression even when the output is heavy-tailed.



The Description Length of Deep Learning models

Neural Information Processing Systems

Solomonoff's general theory of inference (Solomonoff, 1964) and the Minimum Description Length principle (Grünwald, 2007; Rissanen, 2007) formalize Occam's razor, and hold that a good model of data is a model that is good at losslessly compressing the data, including the cost of describing the model itself. Deep neural networks might seem to go against this principle given the large number of parameters to be encoded. We demonstrate experimentally the ability of deep neural networks to compress the training data even when accounting for parameter encoding. The compression viewpoint originally motivated the use of variational methods in neural networks (Hinton and V an Camp, 1993; Schmidhuber, 1997). Unexpectedly, we found that these variational methods provide surprisingly poor compression bounds, despite being explicitly built to minimize such bounds. This might explain the relatively poor practical performance of variational methods in deep learning. On the other hand, simple incremental encoding methods yield excellent compression values on deep networks, vindicating Solomonoff's approach.